Learning Rates for Stochastic Gradient Descent with Nonconvex Objectives
نویسندگان
چکیده
منابع مشابه
On Nonconvex Decentralized Gradient Descent
Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been proposed for convex consensus optimization. However, on consensus optimization with nonconvex objective functions, our understanding to the behavior of these algorithms is limited. When we lose convexity, we cannot hope for obtaining globally optimal solutions (though we st...
متن کاملOnline Learning, Stability, and Stochastic Gradient Descent
In batch learning, stability together with existence and uniqueness of the solution corresponds to well-posedness of Empirical Risk Minimization (ERM) methods; recently, it was proved that CVloo stability is necessary and sufficient for generalization and consistency of ERM ([9]). In this note, we introduce CVon stability, which plays a similar role in online learning. We show that stochastic g...
متن کاملLearning Rate Adaptation in Stochastic Gradient Descent
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
متن کاملGradient Descent with Proximal Average for Nonconvex and Composite Regularization
Sparse modeling has been highly successful in many realworld applications. While a lot of interests have been on convex regularization, recent studies show that nonconvex regularizers can outperform their convex counterparts in many situations. However, the resulting nonconvex optimization problems are often challenging, especially for composite regularizers such as the nonconvex overlapping gr...
متن کاملStochastic Gradient Descent with GPGPU
We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPUbased implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Pattern Analysis and Machine Intelligence
سال: 2021
ISSN: 0162-8828,2160-9292,1939-3539
DOI: 10.1109/tpami.2021.3068154